Tootfinder

@arXiv_csSE_bot@mastoxiv.page
2024-05-09 07:19:27

Lessons from the Use of Natural Language Inference (NLI) in Requirements Engineering Tasks
Mohamad Fazelnia, Viktoria Koscinski, Spencer Herzog, Mehdi Mirakhorli
https://arxiv.org/abs/2405.05135

Lessons from the Use of Natural Language Inference (NLI) in Requirements Engineering Tasks
We investigate the use of Natural Language Inference (NLI) in automating requirements engineering tasks. In particular, we focus on three tasks: requirements classification, identification of requirements specification defects, and detection of conflicts in stakeholders' requirements. While previous research has demonstrated significant benefit in using NLI as a universal method for a broad spectrum of natural language processing tasks, these advantages have not been investigated within the con…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:43

CourseGPT-zh: an Educational Large Language Model Based on Knowledge Distillation Incorporating Prompt Optimization
Zheyan Qu, Lu Yin, Zitong Yu, Wenbo Wang, Xing zhang
https://arxiv.org/abs/2405.04781

CourseGPT-zh: an Educational Large Language Model Based on Knowledge Distillation Incorporating Prompt Optimization
Large language models (LLMs) have demonstrated astonishing capabilities in natural language processing (NLP) tasks, sparking interest in their application to professional domains with higher specialized requirements. However, restricted access to closed-source LLMs via APIs and the difficulty in collecting massive high-quality datasets pose obstacles to the development of large language models in education fields of various courses. Given these challenges, we propose CourseGPT-zh, a course-orie…

@arXiv_eessSY_bot@mastoxiv.page
2024-05-10 06:54:28

Enhancing Holonic Architecture with Natural Language Processing for System of Systems
Muhammad Ashfaq, Ahmed R. Sadik, Tommi Mikkonen, Muhammad Waseem, Niko M akitalo
https://arxiv.org/abs/2405.05365

Enhancing Holonic Architecture with Natural Language Processing for System of Systems
The complexity and dynamic nature of System of Systems (SoS) necessitate efficient communication mechanisms to ensure interoperability and collaborative functioning among constituent systems, termed holons. This paper proposes an innovative approach to enhance holon communication within SoS through the integration of Conversational Generative Intelligence (CGI) techniques. Our approach leverages advancements in CGI, specifically Large Language Models (LLMs), to enable holons to understand and a…

@arXiv_csNE_bot@mastoxiv.page
2024-04-10 06:51:29

Exploring the True Potential: Evaluating the Black-box Optimization Capability of Large Language Models
Beichen Huang, Xingyu Wu, Yu Zhou, Jibin Wu, Liang Feng, Ran Cheng, Kay Chen Tan
https://arxiv.org/abs/2404.06290

Exploring the True Potential: Evaluating the Black-box Optimization Capability of Large Language Models
Large language models (LLMs) have gained widespread popularity and demonstrated exceptional performance not only in natural language processing (NLP) tasks but also in non-linguistic domains. Their potential as artificial general intelligence extends beyond NLP, showcasing promising capabilities in diverse optimization scenarios. Despite this rising trend, whether the integration of LLMs into these black-box optimization problems is genuinely beneficial remains unexplored. This paper endeavors …

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:43

CourseGPT-zh: an Educational Large Language Model Based on Knowledge Distillation Incorporating Prompt Optimization
Zheyan Qu, Lu Yin, Zitong Yu, Wenbo Wang, Xing zhang
https://arxiv.org/abs/2405.04781

CourseGPT-zh: an Educational Large Language Model Based on Knowledge Distillation Incorporating Prompt Optimization
Large language models (LLMs) have demonstrated astonishing capabilities in natural language processing (NLP) tasks, sparking interest in their application to professional domains with higher specialized requirements. However, restricted access to closed-source LLMs via APIs and the difficulty in collecting massive high-quality datasets pose obstacles to the development of large language models in education fields of various courses. Given these challenges, we propose CourseGPT-zh, a course-orie…

@arXiv_csCV_bot@mastoxiv.page
2024-03-08 08:30:16

This https://arxiv.org/abs/2403.02308 has been replaced.
link: https://scholar.google.com/scholar?q=a

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces Vision-RWKV (VRWKV), a model adapted from the RWKV model used in the NLP field with necessary modifications for vision tasks. Similar to the Vision Transformer (ViT), our model is designed to efficiently handle sparse inputs and demonstrate robust global processing c…

@arXiv_csHC_bot@mastoxiv.page
2024-04-09 07:16:40

Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring
Nazar Ponochevnyi, Anastasia Kuzminykh
https://arxiv.org/abs/2404.05103

Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring
Recent chart-authoring systems, such as Amazon Q in QuickSight and Copilot for Power BI, demonstrate an emergent focus on supporting natural language input to share meaningful insights from data through chart creation. Currently, chart-authoring systems tend to integrate voice input capabilities by relying on speech-to-text transcription, processing spoken and typed input similarly. However, cross-modality input comparisons in other interaction domains suggest that the structure of spoken and t…

@arXiv_csIR_bot@mastoxiv.page
2024-04-08 06:50:14

Understanding Language Modeling Paradigm Adaptations in Recommender Systems: Lessons Learned and Open Challenges
Lemei Zhang, Peng Liu, Yashar Deldjoo, Yong Zheng, Jon Atle Gulla
https://arxiv.org/abs/2404.03788

Understanding Language Modeling Paradigm Adaptations in Recommender Systems: Lessons Learned and Open Challenges
The emergence of Large Language Models (LLMs) has achieved tremendous success in the field of Natural Language Processing owing to diverse training paradigms that empower LLMs to effectively capture intricate linguistic patterns and semantic representations. In particular, the recent "pre-train, prompt and predict" training paradigm has attracted significant attention as an approach for learning generalizable models with limited labeled data. In line with this advancement, these training paradi…

@UP8@mastodon.social
2024-03-09 01:23:54

🔌 Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges
#ai

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges
Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) leads to a robust encoder for the transformer model that can understand the broader context and deliv…

@arXiv_eessSP_bot@mastoxiv.page
2024-04-09 08:51:26

This https://arxiv.org/abs/2403.15417 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

Enhancing Automatic Modulation Recognition for IoT Applications Using Transformers
Automatic modulation recognition (AMR) is vital for accurately identifying modulation types within incoming signals, a critical task for optimizing operations within edge devices in IoT ecosystems. This paper presents an innovative approach that leverages Transformer networks, initially designed for natural language processing, to address the challenges of efficient AMR. Our transformer network architecture is designed with the mindset of real-time edge computing on IoT devices. Four tokenizati…

@arXiv_csLG_bot@mastoxiv.page
2024-03-06 07:35:07

Privacy-Aware Semantic Cache for Large Language Models
Waris GillVirginia Tech, USA, Mohamed ElidrisiCisco, USA, Pallavi KalapatapuCisco, USA, Ali AnwarUniversity of Minnesota, Minneapolis, USA, Muhammad Ali GulzarVirginia Tech, USA
https://arxiv.org/abs/2403.02694

Privacy-Aware Semantic Cache for Large Language Models
Large Language Models (LLMs) like ChatGPT, Google Bard, Claude, and Llama 2 have revolutionized natural language processing and search engine dynamics. However, these models incur exceptionally high computational costs. For instance, GPT-3 consists of 175 billion parameters and inference on these models also demands billions of floating-point operations. Caching is a natural solution to reduce LLM inference costs on repeated queries. However, existing caching methods are incapable of finding se…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 08:30:02

This https://arxiv.org/abs/2404.05694 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are common. This paper explores strategies for adapting these models to domain-specific requirements, primar…

@arXiv_csCL_bot@mastoxiv.page
2024-05-08 08:33:03

This https://arxiv.org/abs/2405.02937 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Unraveling the Dominance of Large Language Models Over Transformer Models for Bangla Natural Language Inference: A Comprehensive Study
Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP), providing insights into the entailment relationships between text pairings. It is a critical component of Natural Language Understanding (NLU), demonstrating the ability to extract information from spoken or written interactions. NLI is mainly concerned with determining the entailment relationship between two statements, known as the premise and hypothesis. When the premise logically implies the hypothesis, …

@arXiv_csDC_bot@mastoxiv.page
2024-05-06 08:27:13

This https://arxiv.org/abs/2404.13236 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDC_…

LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models
Large Language Models (LLMs) have witnessed rapid growth in emerging challenges and capabilities of language understanding, generation, and reasoning. Despite their remarkable performance in natural language processing-based applications, LLMs are susceptible to undesirable and erratic behaviors, including hallucinations, unreliable reasoning, and the generation of harmful content. These flawed behaviors undermine trust in LLMs and pose significant hurdles to their adoption in real-world applic…

@arXiv_csAR_bot@mastoxiv.page
2024-05-08 06:46:58

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator
Mohit Upadhyay, Rohan Juneja, Weng-Fai Wong, Li-Shiuan Peh
https://arxiv.org/abs/2405.04206

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator
Attention mechanisms are becoming increasingly popular, being used in neural network models in multiple domains such as natural language processing (NLP) and vision applications, especially at the edge. However, attention layers are difficult to map onto existing neuro accelerators since they have a much higher density of non-linear operations, which lead to inefficient utilization of today's vector units. This work introduces NOVA, a NoC-based Vector Unit that can perform non-linear operations…

@arXiv_csIR_bot@mastoxiv.page
2024-04-09 08:45:22

This https://arxiv.org/abs/2403.18276 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

RankMamba, Benchmarking Mamba's Document Ranking Performance in the Era of Transformers
Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (IR). Transformer architecture's core mechanism -- attention requires $O(n^2)$ time complexity in training and $O(n)$ time complexity in inference. Many works have been proposed to improve the attention mechanism's scalability, such as Flash Attention and Multi-query Attention. A different line of work aims t…

@arXiv_csSE_bot@mastoxiv.page
2024-05-06 08:30:18

This https://arxiv.org/abs/2401.01508 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

Practical Guidelines for the Selection and Evaluation of Natural Language Processing Techniques in Requirements Engineering
[Context and Motivation] Natural Language Processing (NLP) is now a cornerstone of requirements automation. One compelling factor behind the growing adoption of NLP in Requirements Engineering (RE) is the prevalent use of natural language (NL) for specifying requirements in industry. NLP techniques are commonly used for automatically classifying requirements, extracting important information, e.g., domain models and glossary terms, and performing quality assurance tasks, such as ambiguity handl…

@tschfflr@fediscience.org
2024-02-26 10:56:32

🚨 Reminder to submit your best work on #Ukrainian to the 3rd Ukrainian Natural Language Processing workshop! #nlproc #callForPapers
Extended deadline: March 4!

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 08:30:06

This https://arxiv.org/abs/2404.06162 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports
As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study because financial reports not only are long but also use numbers and tables extensively. We propos…

@arXiv_csDL_bot@mastoxiv.page
2024-04-08 08:29:27

This https://arxiv.org/abs/2401.01751 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDL_…

Text mining arXiv: a look through quantitative finance papers
This paper explores articles hosted on the arXiv preprint server with the aim to uncover valuable insights hidden in this vast collection of research. Employing text mining techniques and through the application of natural language processing methods, we examine the contents of quantitative finance papers posted in arXiv from 1997 to 2022. We extract and analyze crucial information from the entire documents, including the references, to understand the topics trends over time and to find out the…

@arXiv_csCE_bot@mastoxiv.page
2024-02-27 06:47:17

ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing
Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, Yonghong Tian
https://arxiv.org/abs/2402.16445 https://arxiv.org/pdf/2402.16445
arXiv:2402.16445v1 Announce Type: new
Abstract: Large Language Models (LLMs), including GPT-x and LLaMA2, have achieved remarkable performance in multiple Natural Language Processing (NLP) tasks. Under the premise that protein sequences constitute the protein language, Protein Large Language Models (ProLLMs) trained on protein corpora excel at de novo protein sequence generation. However, as of now, unlike LLMs in NLP, no ProLLM is capable of multiple tasks in the Protein Language Processing (PLP) field. This prompts us to delineate the inherent limitations in current ProLLMs: (i) the lack of natural language capabilities, (ii) insufficient instruction understanding, and (iii) high training resource demands. To address these challenges, we introduce a training framework to transform any general LLM into a ProLLM capable of handling multiple PLP tasks. Specifically, our framework utilizes low-rank adaptation and employs a two-stage training approach, and it is distinguished by its universality, low overhead, and scalability. Through training under this framework, we propose the ProLLaMA model, the first known ProLLM to handle multiple PLP tasks simultaneously. Experiments show that ProLLaMA achieves state-of-the-art results in the unconditional protein sequence generation task. In the controllable protein sequence generation task, ProLLaMA can design novel proteins with desired functionalities. In the protein property prediction task, ProLLaMA achieves nearly 100\% accuracy across many categories. The latter two tasks are beyond the reach of other ProLLMs. Code is available at \url{https://github.com/Lyu6PosHao/ProLLaMA}.

@arXiv_csCL_bot@mastoxiv.page
2024-04-10 06:48:52

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports
Tianyu Cao, Natraj Raman, Danial Dervovic, Chenhao Tan
https://arxiv.org/abs/2404.06162

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports
As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study because financial reports not only are long but also use numbers and tables extensively. We propos…

@arXiv_csNE_bot@mastoxiv.page
2024-05-07 06:56:54

Exploring the Improvement of Evolutionary Computation via Large Language Models
Jinyu Cai, Jinglue Xu, Jialong Li, Takuto Ymauchi, Hitoshi Iba, Kenji Tei
https://arxiv.org/abs/2405.02876

Exploring the Improvement of Evolutionary Computation via Large Language Models
Evolutionary computation (EC), as a powerful optimization algorithm, has been applied across various domains. However, as the complexity of problems increases, the limitations of EC have become more apparent. The advent of large language models (LLMs) has not only transformed natural language processing but also extended their capabilities to diverse fields. By harnessing LLMs' vast knowledge and adaptive capabilities, we provide a forward-looking overview of potential improvements LLMs can bri…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:44

Zero-shot LLM-guided Counterfactual Generation for Text
Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu
https://arxiv.org/abs/2405.04793 http…

Zero-shot LLM-guided Counterfactual Generation for Text
Counterfactual examples are frequently used for model development and evaluation in many natural language processing (NLP) tasks. Although methods for automated counterfactual generation have been explored, such methods depend on models such as pre-trained language models that are then fine-tuned on auxiliary, often task-specific datasets. Collecting and annotating such datasets for counterfactual generation is labor intensive and therefore, infeasible in practice. Therefore, in this work, we f…

@arXiv_csSI_bot@mastoxiv.page
2024-05-07 07:03:57

Structural Balance in Real-World Social Networks: Incorporating Direction and Transitivity in Measuring Partial Balance
Rezvaneh Rezapour, Ly Dinh, Lan Jiang, Jana Diesner
https://arxiv.org/abs/2405.02798

Structural Balance in Real-World Social Networks: Incorporating Direction and Transitivity in Measuring Partial Balance
Structural balance theory predicts that triads in networks gravitate towards stable configurations. The theory has been verified for undirected graphs. Since real-world networks are often directed, we introduce a novel method for considering both transitivity and sign consistency for evaluating partial balance in signed digraphs. We test our approach on graphs constructed by using different methods for identifying edge signs: natural language processing to infer signs from underlying text data,…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:44

Zero-shot LLM-guided Counterfactual Generation for Text
Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu
https://arxiv.org/abs/2405.04793 http…

Zero-shot LLM-guided Counterfactual Generation for Text
Counterfactual examples are frequently used for model development and evaluation in many natural language processing (NLP) tasks. Although methods for automated counterfactual generation have been explored, such methods depend on models such as pre-trained language models that are then fine-tuned on auxiliary, often task-specific datasets. Collecting and annotating such datasets for counterfactual generation is labor intensive and therefore, infeasible in practice. Therefore, in this work, we f…

@arXiv_csSE_bot@mastoxiv.page
2024-03-07 08:28:18

This https://arxiv.org/abs/2310.12357 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?
Large language models (LLMs) have demonstrated significant potential in the realm of natural language understanding and programming code processing tasks. Their capacity to comprehend and generate human-like code has spurred research into harnessing LLMs for code analysis purposes. However, the existing body of literature falls short in delivering a systematic evaluation and assessment of LLMs' effectiveness in code analysis, particularly in the context of obfuscated code. This paper seeks to…

@arXiv_csAI_bot@mastoxiv.page
2024-03-27 06:46:41

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
Youpeng Zhao, Di Wu, Jun Wang
https://arxiv.org/abs/2403.17312 https://

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
The Transformer architecture has significantly advanced natural language processing (NLP) and has been foundational in developing large language models (LLMs) such as LLaMA and OPT, which have come to dominate a broad range of NLP tasks. Despite their superior accuracy, LLMs present unique challenges in practical inference, concerning the compute and memory-intensive nature. Thanks to the autoregressive characteristic of LLM inference, KV caching for the attention layers in Transformers can eff…

@arXiv_csCL_bot@mastoxiv.page
2024-04-10 06:49:13

ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish
Fernando Gallego, Guillermo L\'opez-Garc\'ia, Luis Gasco-S\'anchez, Martin Krallinger, Francisco J. Veredas
https://arxiv.org/abs/2404.06367

ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish
Advances in natural language processing techniques, such as named entity recognition and normalization to widely used standardized terminologies like UMLS or SNOMED-CT, along with the digitalization of electronic health records, have significantly advanced clinical text analysis. This study presents ClinLinker, a novel approach employing a two-phase pipeline for medical entity linking that leverages the potential of in-domain adapted language models for biomedical text mining: initial candidate…

@arXiv_csLG_bot@mastoxiv.page
2024-05-02 07:18:11

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Lucas-Andre\"i Thil, Mirela Popa, Gerasimos Spanakis
https://arxiv.org/abs/2405.00516

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Recent advancements in language models have demonstrated remarkable improvements in various natural language processing (NLP) tasks such as web navigation. Supervised learning (SL) approaches have achieved impressive performance while utilizing significantly less training data compared to previous methods. However, these SL-based models fall short when compared to reinforcement learning (RL) approaches, which have shown superior results. In this paper, we propose a novel approach that combines …

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:49:14

CARE-SD: Classifier-based analysis for recognizing and eliminating stigmatizing and doubt marker labels in electronic health records: model development and validation
Drew Walker, Annie Thorne, Sudeshna Das, Jennifer Love, Hannah LF Cooper, Melvin Livingston III, Abeed Sarker
https://arxiv.org/abs/2405.05204

CARE-SD: Classifier-based analysis for recognizing and eliminating stigmatizing and doubt marker labels in electronic health records: model development and validation
Objective: To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques. Materials and Methods: We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.5, and refined through human evaluation. These lexi…

@arXiv_eessIV_bot@mastoxiv.page
2024-04-05 06:54:08

ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model
Hongruixuan Chen, Jian Song, Chengxi Han, Junshi Xia, Naoto Yokoya
https://arxiv.org/abs/2404.03425

ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model
Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have their inherent shortcomings. Recently, the Mamba architecture, based on spatial state models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the …

@arXiv_csDS_bot@mastoxiv.page
2024-05-03 08:44:45

This https://arxiv.org/abs/2305.14461 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDS_…

Engineering Rank/Select Data Structures for Large-Alphabet Strings
Large-alphabet strings are common in scenarios such as information retrieval and natural-language processing. The efficient storage and processing of such strings usually introduces several challenges that are not witnessed in small-alphabets strings. This paper studies the efficient implementation of one of the most effective approaches for dealing with large-alphabet strings, namely the \emph{alphabet-partitioning} approach. The main contribution is a compressed data structure that supports t…

@arXiv_csRO_bot@mastoxiv.page
2024-04-26 07:14:36

Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model
Yongqi Zhao, Wenbo Xiao, Tomislav Mihalj, Jia Hu, Arno Eichberger
https://arxiv.org/abs/2404.16147

Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model
The advent of Large Language Models (LLM) provides new insights to validate Automated Driving Systems (ADS). In the herein-introduced work, a novel approach to extracting scenarios from naturalistic driving datasets is presented. A framework called Chat2Scenario is proposed leveraging the advanced Natural Language Processing (NLP) capabilities of LLM to understand and identify different driving scenarios. By inputting descriptive texts of driving conditions and specifying the criticality metric…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:49:14

CARE-SD: Classifier-based analysis for recognizing and eliminating stigmatizing and doubt marker labels in electronic health records: model development and validation
Drew Walker, Annie Thorne, Sudeshna Das, Jennifer Love, Hannah LF Cooper, Melvin Livingston III, Abeed Sarker
https://arxiv.org/abs/2405.05204

CARE-SD: Classifier-based analysis for recognizing and eliminating stigmatizing and doubt marker labels in electronic health records: model development and validation
Objective: To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques. Materials and Methods: We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.5, and refined through human evaluation. These lexi…

@arXiv_csIT_bot@mastoxiv.page
2024-02-27 08:21:40

This https://arxiv.org/abs/2308.06013 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIT_…

Large Language Models for Telecom: Forthcoming Impact on the Industry
Large Language Models (LLMs), AI-driven models that can achieve general-purpose language understanding and generation, have emerged as a transformative force, revolutionizing fields well beyond Natural Language Processing (NLP) and garnering unprecedented attention. As LLM technology continues to progress, the telecom industry is facing the prospect of its impact on its landscape. To elucidate these implications, we delve into the inner workings of LLMs, providing insights into their current ca…

@arXiv_econGN_bot@mastoxiv.page
2024-04-30 06:53:22

Quantitative Tools for Time Series Analysis in Natural Language Processing: A Practitioners Guide
W. Benedikt Schmal
https://arxiv.org/abs/2404.18499 https…

Quantitative Tools for Time Series Analysis in Natural Language Processing: A Practitioners Guide
Natural language processing tools have become frequently used in social sciences such as economics, political science, and sociology. Many publications apply topic modeling to elicit latent topics in text corpora and their development over time. Here, most publications rely on visual inspections and draw inference on changes, structural breaks, and developments over time. We suggest using univariate time series econometrics to introduce more quantitative rigor that can strengthen the analyses. …

@arXiv_csCY_bot@mastoxiv.page
2024-03-25 06:56:40

Application of GPT Language Models for Innovation in Activities in University Teaching
Manuel de Buenaga, Francisco Javier Bueno
https://arxiv.org/abs/2403.14694

Application of GPT Language Models for Innovation in Activities in University Teaching
The GPT (Generative Pre-trained Transformer) language models are an artificial intelligence and natural language processing technology that enables automatic text generation. There is a growing interest in applying GPT language models to university teaching in various dimensions. From the perspective of innovation in student and teacher activities, they can provide support in understanding and generating content, problem-solving, as well as personalization and test correction, among others. Fro…

@arXiv_csNE_bot@mastoxiv.page
2024-04-08 07:28:36

Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning
Spyridon Chavlis, Panayiota Poirazi
https://arxiv.org/abs/2404.03708

Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning
Artificial neural networks (ANNs) are at the core of most Deep learning (DL) algorithms that successfully tackle complex problems like image recognition, autonomous driving, and natural language processing. However, unlike biological brains who tackle similar problems in a very efficient manner, DL algorithms require a large number of trainable parameters, making them energy-intensive and prone to overfitting. Here, we show that a new ANN architecture that incorporates the structured connectivi…

@arXiv_csLG_bot@mastoxiv.page
2024-05-02 07:18:11

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Lucas-Andre\"i Thil, Mirela Popa, Gerasimos Spanakis
https://arxiv.org/abs/2405.00516

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Recent advancements in language models have demonstrated remarkable improvements in various natural language processing (NLP) tasks such as web navigation. Supervised learning (SL) approaches have achieved impressive performance while utilizing significantly less training data compared to previous methods. However, these SL-based models fall short when compared to reinforcement learning (RL) approaches, which have shown superior results. In this paper, we propose a novel approach that combines …

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:50

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
Sankalp Bahad, Pruthwik Mishra, Karunesh Arora, Rakesh Chandra Balabantaray, Dipti Misra Sharma, Parameswari Krishnamurthy
https://arxiv.org/abs/2405.04829

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian Lang…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:50

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
Sankalp Bahad, Pruthwik Mishra, Karunesh Arora, Rakesh Chandra Balabantaray, Dipti Misra Sharma, Parameswari Krishnamurthy
https://arxiv.org/abs/2405.04829

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian Lang…

@arXiv_csNE_bot@mastoxiv.page
2024-04-08 07:28:36

Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning
Spyridon Chavlis, Panayiota Poirazi
https://arxiv.org/abs/2404.03708

Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning
Artificial neural networks (ANNs) are at the core of most Deep learning (DL) algorithms that successfully tackle complex problems like image recognition, autonomous driving, and natural language processing. However, unlike biological brains who tackle similar problems in a very efficient manner, DL algorithms require a large number of trainable parameters, making them energy-intensive and prone to overfitting. Here, we show that a new ANN architecture that incorporates the structured connectivi…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:56

Improving Long Text Understanding with Knowledge Distilled from Summarization Model
Yan Liu, Yazheng Yang, Xiaokang Chen
https://arxiv.org/abs/2405.04955 h…

Improving Long Text Understanding with Knowledge Distilled from Summarization Model
Long text understanding is important yet challenging for natural language processing. A long article or document usually contains many redundant words that are not pertinent to its gist and sometimes can be regarded as noise. With recent advances of abstractive summarization, we propose our \emph{Gist Detector} to leverage the gist detection ability of a summarization model and integrate the extracted gist into downstream models to enhance their long text understanding ability. Specifically, Gi…

@arXiv_csIR_bot@mastoxiv.page
2024-02-28 06:50:29

Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey
Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller, Dorien Herremans
https://arxiv.org/abs/2402.17467

Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey
Several adaptations of Transformers models have been developed in various domains since its breakthrough in Natural Language Processing (NLP). This trend has spread into the field of Music Information Retrieval (MIR), including studies processing music data. However, the practice of leveraging NLP tools for symbolic music data is not novel in MIR. Music has been frequently compared to language, as they share several similarities, including sequential representations of text and music. These ana…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:56

Improving Long Text Understanding with Knowledge Distilled from Summarization Model
Yan Liu, Yazheng Yang, Xiaokang Chen
https://arxiv.org/abs/2405.04955 h…

Improving Long Text Understanding with Knowledge Distilled from Summarization Model
Long text understanding is important yet challenging for natural language processing. A long article or document usually contains many redundant words that are not pertinent to its gist and sometimes can be regarded as noise. With recent advances of abstractive summarization, we propose our \emph{Gist Detector} to leverage the gist detection ability of a summarization model and integrate the extracted gist into downstream models to enhance their long text understanding ability. Specifically, Gi…

@arXiv_csDC_bot@mastoxiv.page
2024-03-29 06:48:27

Resource Allocation in Large Language Model Integrated 6G Vehicular Networks
Chang Liu, Jun Zhao
https://arxiv.org/abs/2403.19016 https://

Resource Allocation in Large Language Model Integrated 6G Vehicular Networks
In the upcoming 6G era, vehicular networks are shifting from simple Vehicle-to-Vehicle (V2V) communication to the more complex Vehicle-to-Everything (V2X) connectivity. At the forefront of this shift is the incorporation of Large Language Models (LLMs) into vehicles. Known for their sophisticated natural language processing abilities, LLMs change how users interact with their vehicles. This integration facilitates voice-driven commands and interactions, departing from the conventional manual co…

@arXiv_csCL_bot@mastoxiv.page
2024-04-10 06:49:07

Finding fake reviews in e-commerce platforms by using hybrid algorithms
Mathivanan Periasamy, Rohith Mahadevan, Bagiya Lakshmi S, Raja CSP Raman, Hasan Kumar S, Jasper Jessiman
https://arxiv.org/abs/2404.06339

Finding fake reviews in e-commerce platforms by using hybrid algorithms
Sentiment analysis, a vital component in natural language processing, plays a crucial role in understanding the underlying emotions and opinions expressed in textual data. In this paper, we propose an innovative ensemble approach for sentiment analysis for finding fake reviews that amalgamate the predictive capabilities of Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Decision Tree classifiers. Our ensemble architecture strategically combines these diverse models to capitalize on…

@arXiv_csRO_bot@mastoxiv.page
2024-04-26 07:14:36

Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model
Yongqi Zhao, Wenbo Xiao, Tomislav Mihalj, Jia Hu, Arno Eichberger
https://arxiv.org/abs/2404.16147

Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model
The advent of Large Language Models (LLM) provides new insights to validate Automated Driving Systems (ADS). In the herein-introduced work, a novel approach to extracting scenarios from naturalistic driving datasets is presented. A framework called Chat2Scenario is proposed leveraging the advanced Natural Language Processing (NLP) capabilities of LLM to understand and identify different driving scenarios. By inputting descriptive texts of driving conditions and specifying the criticality metric…

@arXiv_csAI_bot@mastoxiv.page
2024-03-28 06:46:46

mALBERT: Is a Compact Multilingual BERT Model Still Worth It?
Christophe Servan (ILES, STL), Sahar Ghannay (LISN), Sophie Rosset (LISN)
https://arxiv.org/abs/2403.18338

mALBERT: Is a Compact Multilingual BERT Model Still Worth It?
Within the current trend of Pretained Language Models (PLM), emerge more and more criticisms about the ethical andecological impact of such models. In this article, considering these critical remarks, we propose to focus on smallermodels, such as compact models like ALBERT, which are more ecologically virtuous than these PLM. However,PLMs enable huge breakthroughs in Natural Language Processing tasks, such as Spoken and Natural LanguageUnderstanding, classification, Question--Answering tasks. P…

@arXiv_csCL_bot@mastoxiv.page
2024-05-08 08:32:41

This https://arxiv.org/abs/2404.18255 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

PatentGPT: A Large Language Model for Intellectual Property
In recent years, large language models have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) space is challenging due to the strong need for specialized knowledge, privacy protection, processing of extremely long text in this field. In this technical report, we present for the first time a lo…

@arXiv_csCL_bot@mastoxiv.page
2024-04-08 06:48:04

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models
Bowen Zhang, Kehua Chang, Chunping Li
https://arxiv.org/abs/2404.03921 ht…

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models
Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. However, these advancements mainly pertain to fine-tuning scenarios, leaving explorations into computationally efficient direct inference method…

@arXiv_csDL_bot@mastoxiv.page
2024-04-30 07:13:37

From ChatGPT, DALL-E 3 to Sora: How has Generative AI Changed Digital Humanities Research and Services?
Jiangfeng Liu, Ziyi Wang, Jing Xie, Lei Pei
https://arxiv.org/abs/2404.18518

From ChatGPT, DALL-E 3 to Sora: How has Generative AI Changed Digital Humanities Research and Services?
Generative large-scale language models create the fifth paradigm of scientific research, organically combine data science and computational intelligence, transform the research paradigm of natural language processing and multimodal information processing, promote the new trend of AI-enabled social science research, and provide new ideas for digital humanities research and application. This article profoundly explores the application of large-scale language models in digital humanities research,…

@arXiv_csIR_bot@mastoxiv.page
2024-05-06 06:50:39

Comparative Analysis of Retrieval Systems in the Real World
Dmytro Mozolevskyi, Waseem AlShikh
https://arxiv.org/abs/2405.02048 https://

Comparative Analysis of Retrieval Systems in the Real World
This research paper presents a comprehensive analysis of integrating advanced language models with search and retrieval systems in the fields of information retrieval and natural language processing. The objective is to evaluate and compare various state-of-the-art methods based on their performance in terms of accuracy and efficiency. The analysis explores different combinations of technologies, including Azure Cognitive Search Retriever with GPT-4, Pinecone's Canopy framework, Langchain with …

@arXiv_csHC_bot@mastoxiv.page
2024-02-27 08:21:24

This https://arxiv.org/abs/2401.05200 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csHC_…

Knowledge Sharing in Manufacturing using Large Language Models: User Evaluation and Model Benchmarking
Recent advances in natural language processing enable more intelligent ways to support knowledge sharing in factories. In manufacturing, operating production lines has become increasingly knowledge-intensive, putting strain on a factory's capacity to train and support new operators. This paper introduces a Large Language Model (LLM)-based system designed to retrieve information from the extensive knowledge contained in factory documentation and knowledge shared by expert operators. The system a…

@arXiv_csCL_bot@mastoxiv.page
2024-05-10 08:28:47

This https://arxiv.org/abs/2402.05812 has been replaced.
link: https://scholar.google.com/scholar?q=a

FAQ-Gen: An automated system to generate domain-specific FAQs to aid content comprehension
Frequently Asked Questions (FAQs) refer to the most common inquiries about specific content. They serve as content comprehension aids by simplifying topics and enhancing understanding through succinct presentation of information. In this paper, we address FAQ generation as a well-defined Natural Language Processing task through the development of an end-to-end system leveraging text-to-text transformation models. We present a literature review covering traditional question-answering systems, hi…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 08:29:39

This https://arxiv.org/abs/2402.05812 has been replaced.
link: https://scholar.google.com/scholar?q=a

FAQ-Gen: An automated system to generate domain-specific FAQs to aid content comprehension
Frequently Asked Questions (FAQs) refer to the most common inquiries about specific content. They serve as content comprehension aids by simplifying topics and enhancing understanding through succinct presentation of information. In this paper, we address FAQ generation as a well-defined Natural Language Processing task through the development of an end-to-end system leveraging text-to-text transformation models. We present a literature review covering traditional question-answering systems, hi…

@arXiv_csLG_bot@mastoxiv.page
2024-03-01 06:51:58

Beyond Language Models: Byte Models are Digital World Simulators
Shangda Wu, Xu Tan, Zili Wang, Rui Wang, Xiaobing Li, Maosong Sun
https://arxiv.org/abs/2402.19155

Beyond Language Models: Byte Models are Digital World Simulators
Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next token prediction in natural language processing, we introduce bGPT, a model with next byte prediction to simulate the digital world. bGPT matches specialized models in performance across various modalities, including text, audio, and images, and offers new possibilities for predicting, s…

@arXiv_csCL_bot@mastoxiv.page
2024-04-10 06:48:59

VI-OOD: A Unified Representation Learning Framework for Textual Out-of-distribution Detection
Li-Ming Zhan, Bo Liu, Xiao-Ming Wu
https://arxiv.org/abs/2404.06217

VI-OOD: A Unified Representation Learning Framework for Textual Out-of-distribution Detection
Out-of-distribution (OOD) detection plays a crucial role in ensuring the safety and reliability of deep neural networks in various applications. While there has been a growing focus on OOD detection in visual data, the field of textual OOD detection has received less attention. Only a few attempts have been made to directly apply general OOD detection methods to natural language processing (NLP) tasks, without adequately considering the characteristics of textual data. In this paper, we delve i…

@arXiv_csDC_bot@mastoxiv.page
2024-02-29 08:32:19

This https://arxiv.org/abs/2402.17652 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDC_…

Navigator: A Decentralized Scheduler for Latency-Sensitive ML Workflows
We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Navigator, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing t…

@arXiv_csSE_bot@mastoxiv.page
2024-02-22 06:53:07

Using Large Language Models for Natural Language Processing Tasks in Requirements Engineering: A Systematic Guideline
Andreas Vogelsang, Jannik Fischbach
https://arxiv.org/abs/2402.13823

Using Large Language Models for Natural Language Processing Tasks in Requirements Engineering: A Systematic Guideline
To use Large Language Models (LLMs) in a targeted way for NLP problems in RE, we require both (1) basic knowledge about the inner workings of LLMs and (2) a guideline on how to select and systematically utilize or repurpose LLMs for NLP4RE tasks. This chapter establishes the required knowledge and introduces the fundamentals of LLMs in the first part. In the second part, we present a detailed guideline for students, researchers, and practitioners on using LLMs for their purposes.

@arXiv_csAI_bot@mastoxiv.page
2024-03-28 06:47:00

Neural Architecture Search for Sentence Classification with BERT
Philip Kenneweg, Sarah Schr\"oder, Barbara Hammer
https://arxiv.org/abs/2403.18547 ht…

Neural Architecture Search for Sentence Classification with BERT
Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost. We validate our classification architec…

@arXiv_csDL_bot@mastoxiv.page
2024-04-03 06:54:32

Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest
Walid Hariri
https://arxiv.org/abs/2404.01800

Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest
Scientific articles play a crucial role in advancing knowledge and informing research directions. One key aspect of evaluating scientific articles is the analysis of citations, which provides insights into the impact and reception of the cited works. This article introduces the innovative use of large language models, particularly ChatGPT, for comprehensive sentiment analysis of citations within scientific articles. By leveraging advanced natural language processing (NLP) techniques, ChatGPT ca…

@arXiv_csCL_bot@mastoxiv.page
2024-04-08 08:28:50

This https://arxiv.org/abs/2401.14295 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Demystifying Chains, Trees, and Graphs of Thoughts
The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous exampl…

@arXiv_csDC_bot@mastoxiv.page
2024-02-28 07:34:27

Navigator: A Decentralized Scheduler for Latency-Sensitive ML Workflows
Yuting Yang, Andrea Merlina, Weijia Song, Tiancheng Yuan, Ken Birman, Roman Vitenberg
https://arxiv.org/abs/2402.17652

Navigator: A Decentralized Scheduler for Latency-Sensitive ML Workflows
We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Navigator, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing t…

@arXiv_csIR_bot@mastoxiv.page
2024-03-01 08:33:55

This https://arxiv.org/abs/2308.11131 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
With large language models (LLMs) achieving remarkable breakthroughs in natural language processing (NLP) domains, LLM-enhanced recommender systems have received much attention and have been actively explored currently. In this paper, we focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks. First and foremost, we identify and formulate the lifelong sequential behavior incomprehension problem for LLMs in recommendation domains, i.e., LLMs f…

@arXiv_csCL_bot@mastoxiv.page
2024-05-06 08:26:47

This https://arxiv.org/abs/2403.11894 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality?
Deep learning (DL) has substantially enhanced natural language processing (NLP) in healthcare research. However, the increasing complexity of DL-based NLP necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review of explainable and interpretable DL in healthcare NLP. The term "eXplainable and Interpretable Artificial Intelligence" (XIAI) is introduced to distinguish XAI from IAI. Different models are f…

@arXiv_csDC_bot@mastoxiv.page
2024-04-23 07:27:53

LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models
Mouhamed Amine Bouchiha, Quentin Telnoff, Souhail Bakkali, Ronan Champagnat, Mourad Rabah, Micka\"el Coustaty, Yacine Ghamri-Doudane
https://arxiv.org/abs/2404.13236

LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models
Large Language Models (LLMs) have witnessed rapid growth in emerging challenges and capabilities of language understanding, generation, and reasoning. Despite their remarkable performance in natural language processing-based applications, LLMs are susceptible to undesirable and erratic behaviors, including hallucinations, unreliable reasoning, and the generation of harmful content. These flawed behaviors undermine trust in LLMs and pose significant hurdles to their adoption in real-world applic…

@arXiv_csSE_bot@mastoxiv.page
2024-04-30 08:36:56

This https://arxiv.org/abs/2311.01020 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

Exploring the Problems, their Causes and Solutions of AI Pair Programming: A Study with Practitioners of GitHub Copilot
With the recent advancement of Artificial Intelligence (AI) and Large Language Models (LLMs), AI-based code generation tools become a practical solution for software development. GitHub Copilot, the AI pair programmer, utilizes machine learning models trained on a large corpus of code snippets to generate code suggestions using natural language processing. Despite its popularity in software development, there is limited empirical evidence on the actual experiences of practitioners who work with…

@arXiv_csIR_bot@mastoxiv.page
2024-05-03 06:50:19

"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"
Andrew Parry, Debasis Ganguly, Manish Chandra
https://arxiv.org/abs/2405.01116

"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"
With the increasing ability of large language models (LLMs), in-context learning (ICL) has evolved as a new paradigm for natural language processing (NLP), where instead of fine-tuning the parameters of an LLM specific to a downstream task with labeled examples, a small number of such examples is appended to a prompt instruction for controlling the decoder's generation process. ICL, thus, is conceptually similar to a non-parametric approach, such as $k$-NN, where the prediction for each instanc…

@arXiv_csCL_bot@mastoxiv.page
2024-04-08 06:48:05

Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving
Gulsum Yigit, Mehmet Fatih Amasyali
https://arxiv.org/abs/2404.03938

Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving
Math Word Problem (MWP) solving presents a challenging task in Natural Language Processing (NLP). This study aims to provide MWP solvers with a more diverse training set, ultimately improving their ability to solve various math problems. We propose several methods for data augmentation by modifying the problem texts and equations, such as synonym replacement, rule-based: question replacement, and rule based: reversing question methodologies over two English MWP datasets. This study extends by i…

@arXiv_csSE_bot@mastoxiv.page
2024-02-26 08:33:45

This https://arxiv.org/abs/2310.03128 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities. Recently, many studies have focused on the tool utilization ability of LLMs. They primarily investigated how LLMs effectively collaborate with given specific tools. However, in scenarios where LLMs serve as intelligent agents, as seen in applications like AutoGPT and MetaGPT, LLMs are expected to engage in intricate decision-making processes that involve decid…

@arXiv_csCL_bot@mastoxiv.page
2024-03-04 07:26:47

AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs
Sana Ebrahimi, Kaiwen Chen, Abolfazl Asudeh, Gautam Das, Nick Koudas
https://arxiv.org/abs/2403.00198

AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs
Pre-trained Large Language Models (LLMs) have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we introduce AXOLOTL, a novel post-processing framework, which operates agnostically across tasks and model…

@arXiv_csSE_bot@mastoxiv.page
2024-04-22 06:52:52

Large Language Model Supply Chain: A Research Agenda
Shenao Wang, Yanjie Zhao, Xinyi Hou, Haoyu Wang
https://arxiv.org/abs/2404.12736 https://

Large Language Model Supply Chain: A Research Agenda
The rapid advancements in pre-trained Large Language Models (LLMs) and Large Multimodal Models (LMMs) have ushered in a new era of intelligent applications, transforming fields ranging from natural language processing to content generation. The LLM supply chain represents a crucial aspect of the contemporary artificial intelligence landscape. It encompasses the entire lifecycle of pre-trained models, from its initial development and training to its final deployment and application in various do…

@arXiv_csCL_bot@mastoxiv.page
2024-04-01 08:29:57

This https://arxiv.org/abs/2401.05632 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Natural Language Processing for Dialects of a Language: A Survey
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches. We describe a wide range…

@arXiv_csCL_bot@mastoxiv.page
2024-05-06 08:26:43

This https://arxiv.org/abs/2403.09891 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Fisher Mask Nodes for Language Model Merging
Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of task-specific fine-tuned models. As these models typically only perform one task well, additional training or ensembling is required in multi-task scenarios. The growing field of model merging provides a solution, dealing with the challenge of combining multiple tas…

@arXiv_csCL_bot@mastoxiv.page
2024-03-04 07:27:03

Gender Bias in Large Language Models across Multiple Languages
Jinman Zhao, Yitian Ding, Chen Jia, Yining Wang, Zifan Qian
https://arxiv.org/abs/2403.00277

Gender Bias in Large Language Models across Multiple Languages
With the growing deployment of large language models (LLMs) across various applications, assessing the influence of gender biases embedded in LLMs becomes crucial. The topic of gender bias within the realm of natural language processing (NLP) has gained considerable focus, particularly in the context of English. Nonetheless, the investigation of gender bias in languages other than English is still relatively under-explored and insufficiently analyzed. In this work, We examine gender bias in LLM…

@arXiv_csCL_bot@mastoxiv.page
2024-05-03 07:16:47

Analyzing the Role of Semantic Representations in the Era of Large Language Models
Zhijing Jin, Yuen Chen, Fernando Gonzalez, Jiarui Liu, Jiayi Zhang, Julian Michael, Bernhard Sch\"olkopf, Mona Diab
https://arxiv.org/abs/2405.01502

Analyzing the Role of Semantic Representations in the Era of Large Language Models
Traditionally, natural language processing (NLP) models often use a rich set of features created by linguistic expertise, such as semantic representations. However, in the era of large language models (LLMs), more and more tasks are turned into generic, end-to-end sequence generation problems. In this paper, we investigate the question: what is the role of semantic representations in the era of LLMs? Specifically, we investigate the effect of Abstract Meaning Representation (AMR) across five di…

@arXiv_csCL_bot@mastoxiv.page
2024-05-03 08:44:34

This https://arxiv.org/abs/2404.18759 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Towards A Structured Overview of Use Cases for Natural Language Processing in the Legal Domain: A German Perspective
In recent years, the field of Legal Tech has risen in prevalence, as the Natural Language Processing (NLP) and legal disciplines have combined forces to digitalize legal processes. Amidst the steady flow of research solutions stemming from the NLP domain, the study of use cases has fallen behind, leading to a number of innovative technical methods without a place in practice. In this work, we aim to build a structured overview of Legal Tech use cases, grounded in NLP literature, but also supple…

@arXiv_csCL_bot@mastoxiv.page
2024-05-01 06:49:01

RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
Yucheng Hu, Yuxing Lu
https://arxiv.org/abs/2404.19543 https://arxiv.org/pdf/2404.19543
arXiv:2404.19543v1 Announce Type: new
Abstract: Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge. To mitigate these, recent methodologies have integrated information retrieved from external resources with LLMs, substantially enhancing their performance across NLP tasks. This survey paper addresses the absence of a comprehensive overview on Retrieval-Augmented Language Models (RALMs), both Retrieval-Augmented Generation (RAG) and Retrieval-Augmented Understanding (RAU), providing an in-depth examination of their paradigm, evolution, taxonomy, and applications. The paper discusses the essential components of RALMs, including Retrievers, Language Models, and Augmentations, and how their interactions lead to diverse model structures and applications. RALMs demonstrate utility in a spectrum of tasks, from translation and dialogue systems to knowledge-intensive applications. The survey includes several evaluation methods of RALMs, emphasizing the importance of robustness, accuracy, and relevance in their assessment. It also acknowledges the limitations of RALMs, particularly in retrieval quality and computational efficiency, offering directions for future research. In conclusion, this survey aims to offer a structured insight into RALMs, their potential, and the avenues for their future development in NLP. The paper is supplemented with a Github Repository containing the surveyed works and resources for further study: https://github.com/2471023025/RALM_Survey.

@arXiv_csCL_bot@mastoxiv.page
2024-05-03 08:44:49

This https://arxiv.org/abs/2404.19048 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

A Framework for Real-time Safeguarding the Text Generation of Large Language
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) tasks but also pose ethical and societal risks due to their propensity to generate harmful content. To address this, various approaches have been developed to safeguard LLMs from producing unsafe content. However, existing methods have limitations, including the need for training specific control models and proactive intervention during text generation, that lead to quality degradation and increased compu…

@arXiv_csCL_bot@mastoxiv.page
2024-05-01 08:32:45

This https://arxiv.org/abs/2311.07978 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

How good are Large Language Models on African Languages?
Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on tasks and languages they are not trained on. However, their performance on African languages is largely understudied relative to high-resource languages. We present an analysis of four popular large language models (mT0, Aya, LLaMa 2, and GPT-4) on six tasks (topic classification, sentiment cl…

@arXiv_csCL_bot@mastoxiv.page
2024-05-01 08:33:37

This https://arxiv.org/abs/2404.18255 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

PatentGPT: A Large Language Model for Intellectual Property
In recent years, large language models have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) space is challenging due to the strong need for specialized knowledge, privacy protection, processing of extremely long text in this field. In this technical report, we present for the first time a lo…

@arXiv_csCL_bot@mastoxiv.page
2024-04-29 07:33:03

Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations
R\'emy Decoupes, Roberto Interdonato, Mathieu Roche, Maguelonne Teisseire, Sarah Valentin
https://arxiv.org/abs/2404.17401

Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations
Language models now constitute essential tools for improving efficiency for many professional tasks such as writing, coding, or learning. For this reason, it is imperative to identify inherent biases. In the field of Natural Language Processing, five sources of bias are well-identified: data, annotation, representation, models, and research design. This study focuses on biases related to geographical knowledge. We explore the connection between geography and language models by highlighting thei…

@arXiv_csCL_bot@mastoxiv.page
2024-04-29 07:32:40

2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion
Dongsheng Wang, Xiaoqin Feng, Zeming Liu, Chuan Wang
https://arxiv.org/abs/2404.17122

2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion
Named entity recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying entities in sentences into pre-defined types. It plays a crucial role in various research fields, including entity linking, question answering, and online product recommendation. Recent studies have shown that incorporating multilingual and multimodal datasets can enhance the effectiveness of NER. This is due to language transfer learning and the presence of shared impli…

@arXiv_csCL_bot@mastoxiv.page
2024-04-29 08:28:46

This https://arxiv.org/abs/2305.12829 has been replaced.
link: https://scholar.google.com/scholar?q=a

On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection
Language models are the new state-of-the-art natural language processing (NLP) models and they are being increasingly used in many NLP tasks. Even though there is evidence that language models are biased, the impact of that bias on the fairness of downstream NLP tasks is still understudied. Furthermore, despite that numerous debiasing methods have been proposed in the literature, the impact of bias removal methods on the fairness of NLP tasks is also understudied. In this work, we investigate t…

@arXiv_csCL_bot@mastoxiv.page
2024-03-28 08:28:27

This https://arxiv.org/abs/2403.16432 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

$\textit{LinkPrompt}$: Natural and Universal Adversarial Attacks on Prompt-based Language Models
Prompt-based learning is a new language model training paradigm that adapts the Pre-trained Language Models (PLMs) to downstream tasks, which revitalizes the performance benchmarks across various natural language processing (NLP) tasks. Instead of using a fixed prompt template to fine-tune the model, some research demonstrates the effectiveness of searching for the prompt via optimization. Such prompt optimization process of prompt-based learning on PLMs also gives insight into generating adver…

@arXiv_csCL_bot@mastoxiv.page
2024-04-29 08:29:13

This https://arxiv.org/abs/2402.08015 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets
Large language models (LLMs) have received a lot of attention in natural language processing (NLP) research because of their exceptional performance in understanding and generating human languages. However, low-resource languages are left behind due to the unavailability of resources. In this work, we focus on enhancing the LLaMA-2-Amharic model by integrating task-specific and generative datasets to improve language model performance for Amharic. We compile an Amharic instruction fine-tuning d…

@arXiv_csCL_bot@mastoxiv.page
2024-04-29 07:32:35

Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models
Bradley P. Allen, Paul T. Groth
https://arxiv.org/abs/2404.17000 http…

Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models
A backbone of knowledge graphs are their class membership relations, which assign entities to a given class. As part of the knowledge engineering process, we propose a new method for evaluating the quality of these relations by processing descriptions of a given entity and class using a zero-shot chain-of-thought classifier that uses a natural language intensional definition of a class. We evaluate the method using two publicly available knowledge graphs, Wikidata and CaLiGraph, and 7 large lan…

@arXiv_csCL_bot@mastoxiv.page
2024-04-29 07:32:53

Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM
Xuan Zhang, Wei Gao
https://arxiv.org/abs/2404.17283

Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM
Retrieval-augmented language models have exhibited promising performance across various areas of natural language processing (NLP), including fact-critical tasks. However, due to the black-box nature of advanced large language models (LLMs) and the non-retrieval-oriented supervision signal of specific tasks, the training of retrieval model faces significant challenges under the setting of black-box LLM. We propose an approach leveraging Fine-grained Feedback with Reinforcement Retrieval (FFRR) …

@arXiv_csCL_bot@mastoxiv.page
2024-03-04 07:27:44

Surveying the Dead Minds: Historical-Psychological Text Analysis with Contextualized Construct Representation (CCR) for Classical Chinese
Yuqi Chen, Sixuan Li, Ying Li, Mohammad Atari
https://arxiv.org/abs/2403.00509

Surveying the Dead Minds: Historical-Psychological Text Analysis with Contextualized Construct Representation (CCR) for Classical Chinese
In this work, we develop a pipeline for historical-psychological text analysis in classical Chinese. Humans have produced texts in various languages for thousands of years; however, most of the computational literature is focused on contemporary languages and corpora. The emerging field of historical psychology relies on computational techniques to extract aspects of psychology from historical corpora using new methods developed in natural language processing (NLP). The present pipeline, called…

@arXiv_csCL_bot@mastoxiv.page
2024-02-26 08:30:56

This https://arxiv.org/abs/2402.14379 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Novi jezički modeli za srpski jezik
The paper will briefly present the development history of transformer-based language models for the Serbian language. Several new models for text generation and vectorization, trained on the resources of the Society for Language Resources and Technologies, will also be presented. Ten selected vectorization models for Serbian, including two new ones, will be compared on four natural language processing tasks. Paper will analyze which models are the best for each selected task, how does their siz…

@arXiv_csCL_bot@mastoxiv.page
2024-03-19 07:19:15

From explainable to interpretable deep learning for natural language processing in healthcare: how far from reality?
Guangming Huang, Yunfei Long, Yingya Li, Giorgos Papanastasiou
https://arxiv.org/abs/2403.11894

From explainable to interpretable deep learning for natural language processing in healthcare: how far from reality?
Deep learning (DL) has substantially enhanced healthcare research by addressing various natural language processing (NLP) tasks. Yet, the increasing complexity of DL-based NLP methods necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review on explainable and interpretable DL in healthcare NLP. The term "XIAI" (eXplainable and Interpretable Artificial Intelligence) was introduced to distinguish XAI fr…

@arXiv_csCL_bot@mastoxiv.page
2024-03-01 06:53:44

Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge
Ansh Arora, Xuanli He, Maximilian Mozes, Srinibas Swain, Mark Dras, Qiongkai Xu
https://arxiv.org/abs/2402.19334

Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge
The democratization of pre-trained language models through open-source initiatives has rapidly advanced innovation and expanded access to cutting-edge technologies. However, this openness also brings significant security risks, including backdoor attacks, where hidden malicious behaviors are triggered by specific inputs, compromising natural language processing (NLP) system integrity and reliability. This paper suggests that merging a backdoored model with other homogeneous models can remediate…

@arXiv_csCL_bot@mastoxiv.page
2024-02-29 06:51:04

Tokenization Is More Than Compression
Craig W. Schmidt, Varshini Reddy, Haoran Zhang, Alec Alameddine, Omri Uzan, Yuval Pinter, Chris Tanner
https://arxiv.org/abs/2402.18376

Tokenization Is More Than Compression
Tokenization is a foundational step in Natural Language Processing (NLP) tasks, bridging raw text and language models. Existing tokenization approaches like Byte-Pair Encoding (BPE) originate from the field of data compression, and it has been suggested that the effectiveness of BPE stems from its ability to condense text into a relatively small number of tokens. We test the hypothesis that fewer tokens lead to better downstream performance by introducing PathPiece, a new tokenizer that segment…

@arXiv_csCL_bot@mastoxiv.page
2024-04-29 07:32:49

Prompting Towards Alleviating Code-Switched Data Scarcity in Under-Resourced Languages with GPT as a Pivot
Michelle Terblanche, Kayode Olaleye, Vukosi Marivate
https://arxiv.org/abs/2404.17216

Prompting Towards Alleviating Code-Switched Data Scarcity in Under-Resourced Languages with GPT as a Pivot
Many multilingual communities, including numerous in Africa, frequently engage in code-switching during conversations. This behaviour stresses the need for natural language processing technologies adept at processing code-switched text. However, data scarcity, particularly in African languages, poses a significant challenge, as many are low-resourced and under-represented. In this study, we prompted GPT 3.5 to generate Afrikaans--English and Yoruba--English code-switched sentences, enhancing di…

@arXiv_csCL_bot@mastoxiv.page
2024-03-01 06:53:30

Improving Legal Judgement Prediction in Romanian with Long Text Encoders
Mihai Masala, Traian Rebedea, Horia Velicu
https://arxiv.org/abs/2402.19170 https:…

Improving Legal Judgement Prediction in Romanian with Long Text Encoders
In recent years,the entire field of Natural Language Processing (NLP) has enjoyed amazing novel results achieving almost human-like performance on a variety of tasks. Legal NLP domain has also been part of this process, as it has seen an impressive growth. However, general-purpose models are not readily applicable for legal domain. Due to the nature of the domain (e.g. specialized vocabulary, long documents) specific models and methods are often needed for Legal NLP. In this work we investigate…

@arXiv_csCL_bot@mastoxiv.page
2024-03-22 06:55:54

ChatGPT Alternative Solutions: Large Language Models Survey
Hanieh Alipour, Nick Pendar, Kohinoor Roy
https://arxiv.org/abs/2403.14469 https://

ChatGPT Alternative Solutions: Large Language Models Survey
In recent times, the grandeur of Large Language Models (LLMs) has not only shone in the realm of natural language processing but has also cast its brilliance across a vast array of applications. This remarkable display of LLM capabilities has ignited a surge in research contributions within this domain, spanning a diverse spectrum of topics. These contributions encompass advancements in neural network architecture, context length enhancements, model alignment, training datasets, benchmarking, e…

@arXiv_csCL_bot@mastoxiv.page
2024-03-13 06:48:39

Exploring Safety Generalization Challenges of Large Language Models via Code
Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, Lizhuang Ma
https://arxiv.org/abs/2403.07865

Exploring Safety Generalization Challenges of Large Language Models via Code
The rapid advancement of Large Language Models (LLMs) has brought about remarkable capabilities in natural language processing but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs…

@arXiv_csCL_bot@mastoxiv.page
2024-03-13 06:48:39

Exploring Safety Generalization Challenges of Large Language Models via Code
Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, Lizhuang Ma
https://arxiv.org/abs/2403.07865

Exploring Safety Generalization Challenges of Large Language Models via Code
The rapid advancement of Large Language Models (LLMs) has brought about remarkable capabilities in natural language processing but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs…

Tootfinder

Opt-in global Mastodon full text search. Join the index!